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1 Introduction 

Two of the most important areas in computational finance: Greeks and, respectively, calibration, are based 
on efficient and accurate computation of a large number of sensitivities. This paper gives an overview of 
adjoint and automatic(algorithmic) differentiation techniques to calculate those sensitivities. While only 
recently introduced in the field of computational finance, it was successfully employed in the last 20 years 
in other areas, such as computational fluid dynamics, meteorology and atmospheric sciences, engineering 
design optimization, etc: [25, 27, 42, 40, 46, 47, 33, 61, 68], to mention but a few. 

The computation of the sensitivities is done by performing differentiation at either continuous level or, 
respectively, at discrete level. The "continuous" adjoint approach corresponds to the case where the adjoint 
equation is formulated at the differential equation level, and then discretized. In contrast, the "discrete" 
adjoint approach starts with discretized equations, and then formulates the corresponding discrete adjoint 
equations. Consequently, the continuous adjoint is also known under "differentiate then discretize" moniker, 
while the discreet adjoint corresponds to "discretize then differentiate" moniker. 

The discrete adjoint approach is preferred in many occasions, for several practical reasons: 

• constructing the discrete adjoint equations is a more straightforward and systematic process 

• Automatic Differentiation software can be employed to greatly reduce the development time 

For this reason we will concentrate in this paper on the discrete adjoint approach, and for ease of presen- 
tation we refer to it as the adjoint approach. However, since a few papers that are reviewed here are also 
based on continuous adjoint, we will make that information explicit when we refer to those papers, and 
use the abbreviation ContAdj for the "continuous adjoint" approach. 

Automatic Differentiation (AD), also known as Algorithmic Differentiation, is a chain-rule-based tech- 
nique for evaluating the derivatives with respect to the input variables of functions defined by a high-level 
language computer program. AD relies on the fact that all computer programs, no matter how compli- 
cated, use a finite set of elementary (unary or binary, e.g. sin(-), sqrt(-)) operations as defined by the 
programming language. The value or function computed by the program is simply a composition of these 
elementary functions. The partial derivatives of the elementary functions are known, and the overall 
derivatives can be computed using the chain rule. 

AD has two basic modes of operations, the forward mode and the reverse mode. In the forward mode 
the derivatives are propagated throughout the computation using the chain rule, while the reverse mode 
computes the derivatives for all intermediate variables backwards (i.e., in the reverse order) through the 
computation. In the literature, AD forward mode is sometimes referred to as tangent linear mode, while 
AD reverse mode is denoted as adjoint mode. 

The adjoint method is advantageous for calculating the sensitivities of a small number of outputs with 
respect to a large number of input parameters. The for- ward method is advantageous in the opposite 
case, when the number of outputs (for which we need sensitivities) is larger compared to the number of 
inputs. 

When compared to regular methods (such as finite differencing) for computing sensitivities, AD has 2 
main advantages: reduction in computational time and accuracy. The reduction in computational time is 
assured by a theoretical result [43]that states that the cost of the reverse mode is smaller than five times 
the computational cost of a regular run. The computational cost of the adjoint approach is independent 
of the number of inputs for which we want to obtains the sensitivities with respect to, whereas the cost 
of thetangent linear approach increases linearly with the number of inputs. Regarding accuracy, AD 
computes the derivatives exactly (up to machine precision) while finite differences incur truncation errors. 



The size of the step h needed for finite difference varies with the current value of of input parameters, 
making the problem of choosing h, such that it balances accuracy and stability, a challenging one. AD on 
the other hand, is automatic and time need not be spent in choosing step-size parameters, etc. 

AD software packages can also be employed to speed up the development time. Such tools implement 
the semantic transformation that systematically applies the chain rule of differential calculus to source 
code written in various programming languages. The generated code can operate in forward or reverse 
mode (tangent linear or adjoint model). 

2 General description of the approach 

Let us assume that we have a model dependent on a set of input parameters which produces an output Y. 
We also denote by Z = {X% , Xi ,■■■, Xn } the vector of input parameters with respect to which we want to 
compute sensitivities. 



2.1 Single function 

We assume we have a single function of the model output, denoted F{Y). The goal is to obtain sensitivities 
of F with respect to the components of Z. 
Computation of the vector of sensitivities 
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(2.1) 



is done using the forward (tangent linear) mode. If we include the trivial equations X\ = X±, ■ ■ ■ ,Xn = 
Xn, then we can rewrite the combined expressions in matrix notation. This will prove helpful when 
constructing the adjoint mode, using the fact that adjoint of a matrix A is defined as its conjugate 
transpose (or simply the transpose, if the matrix A has only real elements) 
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To compute each component of the vector ^ we need to evaluate the expression (2.1) a number of N 

times, every time with a different input vector Z = < X\, . . . , Xn > .For example, to compute the derivative 

with respect to Xj, the vector Z has the value Z = (0, 0, . . . , 1, 0, . . . , 0), with the only nonzero element 
in the j — th position. 

To construct the reverse (adjoint) mode, we start with the transposed matrix equation (2.2) With the 
notation of A for the adjoint variable corresponding to an original variable A, we have 
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Consequently, we obtain the following expressions 

X, -*, + §-F (2.3) 

OF - 

Xn = Xn + rs V F 
dX N 

Thus we can obtain the sensitivities in one single run of (2.3), with F = 1 and the adjoint variables 
X\, . . . , Xn initialized to zero. 

2.2 Composite functions 

We can generalize this procedure if the output is obtained through evaluation of a composite function of 
P single functions (which is in fact how the computer codes are represented): 

F = F p oF p - 1 o---oF 1 (Z) 

We apply the tangent linear mode to each F 3 , and we combine them in the recursion relationship. For 
the adjoint (reverse mode), we construct the adjoint for each F J , and we combine them in reverse order. 
Let us describe the process using the matrix notation. If we view the tangent linear as the result of the 
multiplication of a multiplication of a number of operator matrices 

MAT = Mail ■ Mat 2 ■ ■ ■ Mat P 

where each matrix Matj represents either a subroutine or a single statement, then the adjoint approach 
can be viewed as a product of adjoint subproblems 

Mat T = Mat T P ■ Matp_ x ■ ■ ■ Mat[ 

Let us describe how it works through an example for P=2. The computational flow is described by the 
following diagram 

Z -> F 1 (Z) -> F 2 {F 1 {Z)) -»■ Y 

For simplicity, we denote by A the output of F 1 (Z) and by B the output of F 2 (F 1 (Z)) . Y is the 
scalar that is the final output. For example, Y can be the value of the function to be calibrated (in the 
calibration setup) or, respectively, the price of the option (in the pricing setup, e..g. using Monte Carlo 
or finite differences). With these notations, we have 

Z ^ A^B ^Y 

Applying tangent linear methodology (essentially differentiation line by line) we have 
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Putting everything together, we get 

■ _ dYdBdA- 

~ OBdAdZ 

Using notation from AD literature, the adjoint quantities Z, A, B, Y denote the derivatives of Y with 
respect to Z,A,B and, respectively, to Y . We note that this implies that Y = 1. Differentiating again, 
and with a superscript T denoting a matrix or vector transpose, we obtain 

" _ f dY Y _ fdYdA\ T _ fdA\ T - 

\dz) -{dldzj ~{dz) 



In a similar way, we obtain 



a = [ b S\ t = (SL b SS [ = ( b S\ t b 



dAJ \dBdAJ \dAJ 
\dBJ \dBj \dBJ 



Putting everything together, we get 
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dZJ \dAJ \dBJ 
We notice that the tangent linear approach proceeds forward (in forward mode)through the process 

Z ^ A^B ^Y 
while the adjoint approach proceeds backwards (in reverse mode) 

Y -^ B ^ A^tZ 

2.3 Checking the correctness of the implementation 

There are several checks that need to be made [61, 4, 39]. 

First, at any level of the code, the development of the discrete adjoint model can be checked by appling 
the following identity 

(AQf (AQ) = Q T [A T (AQ)] 

where Q represents the input to original code, and A represents either a single statement or a subroutine 
We compare the gradient computed using AD to the gradient computed by Finite Difference (with a 

step size such that sufficient convergence is obtained). We also compare the gradient computed using AD 

in Forward mode versus the gradient computed using AD in Adjoint mode. We expect those two gradients 

to be identical up to machine precision. 

If possible, we may use complex numbers [39, 66], to avoid roundoff errors due to computations with 

finite difference. 



3 Simple example 

We want to compute the gradient of the function f(a,b,c) 
following sequence of statements 



(w — wq) , where w is obtained using the 
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sin (ab) + cb z + a 6 c 

exp (u — l) + a 

In (v 2 + 1) + cos (c 2 - l) 



The input vector is denoted by by z = {a, b, c}, intermediate variables u, w, v and output /. 
We show how the sensitivities with respect to a,b,c, namely 

d£ d£ d£ 

da : db : dc 

are computed in both forward and adjoint mode. For forward mode we also write the expressions in matrix 
notation, to make it easier to understand how the adjoint mode is constructed. 

3.1 Forward (tangent linear) mode 

We follow the computational flow, and thus we start by getting the sensitivities with respect to the 
intermediate variables, denoted by u,v,w 
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Hence we obtain 



[6cos(a6) + 3aV] a + [acos(afr) + 2cb] b + [b 2 + 2a 3 c] c 



In matrix notation 
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In a similar way we obtain 
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In matrix notation 
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To obtain the required sensitivity with respect to j — th component of input vectors, we evaluate the 
above expressions starting with z which has the j — th component set to 1 and all the other components 
set to 0. 

More specifically, the sensitivities at a given set of variables z^' = {a^ ' , b(°',c(°'} are computed by 
calling the forward mode with the initial value z defined as follows: 

da 
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df 
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f computed with z = la,b,c) = (1,0,0) 
/ computed with z= (d,b, c) =(0,1,0) 
/ computed with z= (d, 6, c) =(0,0,1) 



3.2 Adjoint (reverse) mode 

To write the Adjoint, we need to take statements in reverse order. Employing this approach to the 
statements of the forward mode yields 

(to) = (2 (w- w )) {f) 
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Thus the adjoint (reverse) mode is constructed using the following sequence of statements 
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We can compute all 3 sensitivities of function / with respect to a, 6, c by a single application of the 
adjoint mode (3.1), with starting point z = 1. More specifically, we have 



z ( ' I = a computed with a = 1 
(2' ') = 6 computed with 6=1 



06 

z^ ) = c computed with c = 1 
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The reader is encouraged to verify that the gradient computed via the Forward mode or, respectively, 
the Reverse mode has identical value. 

4 Example in PDE solver framework 

The following PDE is considered for u(t,x), on the spatial domain^ < x < B 

du _ du 

dt dx 

u(0,x) = Uq(x) 
u(t,A) =f(t) 
u(t,B) =g(t) 

We discretize the PDE in space and time, for a spatial grid {x^jand time grid {T } We use notation 
of superscript for time index and subscript for spatial index. For simplicity of exposition, we discretize 
using a central difference scheme in space, a first order scheme in time, and we consider that both spatial 
grid and, respective, temporal grid are constant, with Ax and At 



We denote by c the ratio -^Kx- With that, and incorporating the boundary conditions, we have 

u\ +1 = /(T fc+1 ) = / fc+1 

u) +1 = uJ + c(uJ +1 -2«} + «J- 1 ) j = 2...N 

u k N +1 = g(T k ^)±fg k+1 

2 



We want to minimize the difference F = Ylj=i \ U Y ~ ^j ) > with Yj desired values to get at time T 
We want to compute sensitivities of F with respect to the discretized initial condition s ^9 [■ , where 



a / \ 
Uj = U {Xj) 

4.1 Forward (tangent linear) mode 

The tangent linear approach has the expression 

u) +l = (1 - 2c) ii) + c (ii k j+1 + uJLj) j = 2...N 



In matrix notation 
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The last step is 
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In matrix notation 
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4.2 Adjoint (reverse) mode 

We go backwards, and the starting point is (4.2). The corresponding adjoint statements are 
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(4.2) 



Then we take the transpose of the matrix operator in (4.1). The corresponding adjoint statements are, 
for Jfc = M,M -!,...,!, 
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The required sensitivities are given by vS, for j = 1...N 
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5 Example in Monte Carlo framework (evolution of SDE) 

We consider the SDE for a N-dimensional X vector 

dX = a(X,t)dt + a(X,t)dW 

The initial value for X, at time t=0, is denoted byX° = (Xi(T°) , ..., X N (T )) 

For simplicity of exposition, we consider that we discretize it using Euler-Maruyama scheme, with time 
points T k , k = 1...M and T° = 



X(T 



k+l^ 



X(T k ) 

AT k 

AW k 



a(X{T k ),T k )At + a{X(T k ), T k )AW k 

rpk-\-l rpk 



£ . yjT k + l - T k 



where the random number is chosen from A/"(0,1) 

We can recast this discretization scheme into the following expression 

X(T k+1 ) = $(X(T k ),e) 



(5.1) 



where $ may also depend on a set of model parameters G = {#i, ..., 9p} 
We also consider that we have to price an option with payoff G(X(T)) 

We want to compute price sensitivities ("Greeks") with respect to X° and, respectively, to the model 
parameters G 

5.1 Forward (tangent linear) mode 

We consider first the sensitivities with respect to initial conditions 



dG{X(T)) 
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dG(X(T)) dXj{T) 
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where we have used notation Aij(T°,T k ) = 9X J (TO l 
We rewrite (5.2) in vector matrix notation 

,-iT 



dG(X{T)) 
dX(T°) 



dG(X(T)) 
dX{T) 



A(T°,T) 



For simplicity we write the previous relationship in the following format, using the fact that T 



M 



dG r , 



where the superscript T denotes the transpose 
Differentiating (5.1) yields 
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dX(T k+1 ) _ d$(X(T k ),e) 
dX (T k ) ~ dX (T k ) 
dX (T k+1 ) _ dX (T k+l ) dX (T k ) 
dX (T°) dX(T k ) dX(T°) 



(5.3) 



9*(x(T fc ),e) dx{r k ) 

dX(T k ) dX(T") 
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dX (T k ) 
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where D[k] 



A d$(x(T fc ),e) 
dx(r k ) 



We rewrite (5.3) as 



Then we have an iterative process 



A[k + l] =D[k] -A[k] 
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(5.4) 



8X l J 
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D[M - 1] ■ D[M - 2] • A[M - 2] 



D[M-1] ■■■■D[l] • A[l] 



§f« 



•D[M-1] ■■■■D[0] ■ A[0] 
The matrix A[0] is the identity matrix, since 



\ 



\dX k {T*)) 



jk 
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(5.5) 



• 1 
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The iterations in (5.4) employ matrix-matrix product. We will see later that the adjoint mode will 
involve matrix-vector product instead, which will offer important computational savings. 
Now we move to the sensitivities with respect to model parameters 



dG(X(T)) 



A dG(X{T)) dXj(T) _ y, dG(X(T) ). 
fri~8X~(T] dl^-j^ dXj(T) 

nk\ A dXjiT*) 



*J P (T) 
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where we have used notation *$>j p (T 

Differentiating (5.1) with respect to parameters yields 



dX (T fc+1 ) _ d$> (X(T k ), 6) 8X(T k ) d$ (X(T k ), Q 
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(5.7) 



Making the notations D[k] — — ^ < J ' and B[k] — — - g e — — for the corresponding matrices, we 



rewrite (5.7) as 



Then we have an iterative process 



tf[fc + l] = D[k] -^[k]+B[k] 



dG(X{T)) 
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dG{X(T)) 
dX(T) 



*[M] 
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dX{T) 



i T 



(D[M - 1] ■ W[M - 1] + B[M - 1]) (5.8) 
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1 T 
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5.2 Adjoint (reverse) mode 

We construct first the adjoint for computing the sensitivities with respect to initial condition. We start 
with the adjoint equation 

V[k] = D T [k]-V[k + l] (5.9) 

where the superscript T denotes the transpose 
In a recursive manner we obtain 

V[0] = D T [0] ■ V[l] = D T [0] ■ D T [1] ■ V[2] = ■■■ = D T [0] ■ D T [1] ■■■■ D T [M - 1] • V[M] (5.10) 

By taking transpose of (5.10) we have 

V T [0] = V T [M] ■ D[M - 1] • D[M - 2] • • • • D[l] ■ D[0] (5.11) 

We set V[M] to the value (|jy [-&/]) and we combine (5.11) and (5.4), which gives 

ff[0] = ^[0].A[0] 

But the matrix A[0] is the identity matrix, according to 5.5 

Thus we obtain the sensitivities with respect to initial conditions, namely w[0] by applying the recursive 
relationship (5.11) to find V [0]. We note that the product in this iterative process is of the matrix-vector 
type, not matrix-matrix as it was for tangent linear mode 

Now we move to sensitivities with respect to model parameters 

We use again the adjoint equation (5.9). With same initial condition for V[M] , namely V^M] = 
(w[-W]) , and evolving from time T to time T we have that 

dG(X(T)) _ _ _ 2] • . . . DIM - k] • B[M - k - 1] = V[M - k] 

Thus we can rewrite (5.8) as 

^^=JlV T [k + l].B[k] (5.12) 

The values of adjoint vectors V[k] were computed as part of the adjoint approach for sensitivities with 
respect to initial conditions. 

The values of B[k] can be precomputed. We may be able to compute them analytically (e.g., if the 
parameters are volatilities and vol surface is assumed to have a certain parametric representation, such as 
cubic spline), otherwise we can employ the adjoint procedure. 

6 Example in Monte Carlo framework (copula) 

We follow a procedure similar to the one described in [20] 

Let us describe the setup. We have a state vector X of N components, an instrument with a payoff 
.P(X), and the probability distribution Q(X) according to which the components of X are distributed. 
For simplicity we consider a N-dimensional Gaussian copula to model the co-dependence between the 
components of the state vector, namely a joint cumulative density function of the form 

$N [$i (<pi (Xi) , • • • , f N (X N )) ; p] 
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where $n[Zi,- ■ ■ ,Zj\f,p] is a N-dimensional multivariate Gaussian distribution with zero mean and 
a correlation matrix p, $~ 1 is the inverse of the standard normal cumulative distribution and (fi (Xi), 
i=l...N are the marginal distributions of the underlying components. 

The option value is obtained through averaging of Monte Carlo sampling 



1 N MC 

v = —— V p ( ^ {i[ 



N M o 

N M c j=1 

where the superscript W denotes the i— th Monte Carlo path. 

The sampling of the N jointly distributed normal random variables (Z\, Z2, • • • , Zjy) can be done using 
several approaches, such as Cholesky factorization, spectral or singular value decomposition. We select 
Cholesky factorization 

p = C ■ C T because it will enable us to use an existing procedure for its adjoint. Starting from a vector 
Z of independent standard normal variables, we obtain a vector Zof jointly normal random variables 
distributed according to <3?tv [Z\, ■ ■ ■ , Zj^; p] through the product 

Z = C -Z 

We also use the following: 

• if Zi is sampled from a standard normal distribution then <1> (Zi) is in hi [0,1] 

• if X{ is distributed according to marginal distribution (pi, then cp (Xi) is in U [0,1] 

Then Xi = tp~ l (<£ (Zi)) is distributed according to marginal distribution ipi 
Therefore we have the following algorithm [20] 

1. Generate a vector S of independent standard normal variables 

2. Correlate the components through Z = C • H 

3. Set Ui = <S>(Zi), i = 1...N 

4. Set X % = ipr 1 ($ (Z)) = p~ x (Ui) , i = 1...N 

5. Calculate the payoff estimator P (X\, ...,Xn) 

We now show how sensitivities can be computed in this setup. 

The correlation matrix is an input to the procedure, so we may compute correlation risk, i.e., sensitivities 
of the price with respect to entries in the correlation matrix 

These marginal distributions may be obtained from a set of N MARG discrete call, put, digital call 
and digital put values (which may be given as corresponding implied volatilities). We may also compute 
sensitivities with respect to those inputs, denoted by Wj,j = l...N M RG 

6.1 Forward (tangent linear) mode 

Assuming that the payoff function is regular enough (e.g., Lipschitz continuous) the standard pathwise 
differentiation corresponds to forward (tangent linear) mode. The differentiation is applied to steps 1-5 in 
the above algorithm. We need to pay attention if any given step is dependent (implicitly or explicitly) on 
the input parameters with respect to which we want to compute the sensitivities. Step 1 stays the same. 



14 



Step 2 and 3 are differentiated if we want correlation risk, otherwise they remain unchanged. Steps 4 and 
5 are differentiated regardless which of the two types of sensitivities of sensitivities we want to compute. 

To differentiate Step 4 we start from JQ = tp~ (C/j) => Ui = <pi(Xi). 

Differentiating the last equality gives the following formula, if we have the propagated derivative Ui of 
the variable Ui (i.e., we compute the correlation risk) 

U i = ^(X i )X i ^X i 



If we need to compute sensitivities with respect to Wj , then differentiating the same equality gives 



Xi 



"3F V A * 



dx 

We now present the algorithm for tangent linear mode. For ease of presentation we write explicitly the 
algorithm only for the case of correlation sensitivities. 

We assume that we want to compute the sensitivity with respect to entry pi k of the correlation matrix 

1. Generate a vector E of independent standard normal variables 

2. Calculate Z = Cy. ■ 5, where C\ k is the sensitivity of Cholesky factor C with respect to p\ k 

3. Set U t = §f (Zi) , i = 1...N 

4. Set Xi = ttM—, i = 1...N 






N dP 



5. Calculate P = Z?=l ffi& ( x i) " X i 

We note that the derivative of the marginal distribution, denoted by -#i (Xj), is the probability density 
function associated with the marginal distribution ifi, while the derivative @ is the standard normal 
probability density function 

6.2 Adjoint (reverse) mode 

The adjoint mode consists of the adjoint counterparts for each of the steps in the forward mode, plus the 
adjoint of Cholesky factorization [63]. The computation is done in reverse order 

The resulting algorithm consists of the 5 steps described above (in the forward mode) plus the following 
steps corresponding to adjoint counterparts 

1. Calculate the adjoint of the payoff estimator X^ = -Q^-{Xk), k = 1...N 

2. Calculate U k = - . . ** -, k = 1...N 

3. Calculate Z k = U k ^ (Z k ) , k = 1...N 

4. Calculate C = Z ■ E T 

The matrix C = (Cij) ■ ■_-. N obtained at the end of the procedure contains all derivatives of payoff 
estimator with respect to entries Qij of the correlation matrix. We can see that all these sensitivities were 
computed by running the adjoint mode only once, as opposed to the forward (tangent linear) mode, which 
had to be run separately for each entry in the correlation matrix (with a total number of runs of N 2 ) 
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7 Example in calibration/optimization framework 

Let us consider that we have the same SDE as in the previous chapter. We want to minimize a cost 
functional of the type 

F = \ (Model Price[j] — M arketPrice[j]) 

with variables to be optimized being given by model parameters = {9i, ...,9p} 

The gradient of the cost functional with respect to the model parameters would have the expression 

W- k ) == ( 2Z(ModelPrice[j] - MarketPrice[j]) S(ModdPrice\j]) 

The adjoint procedure enables us to compute — - — gg nce u\) j n a s i m il ar wa y to (5.12) 
We note here that the above procedure assumes implicitly the existence of the gradient of the functional. 
It is our experience [47]that the discrete adjoint can still be applied for cases such gradient doe not exist; 
in those cases the numerical adjoint code will provide us not the gradient, but rather subgradients. Conse- 
quently, one will have to employ optimization algorithms that are especially designed to use subgradients 
instead of gradient 

8 Computational finance literature on adjoint and AD 

In the last several years quite a few papers were added to the literature on adjoint /AD applied to com- 
putational finance [11, 22, 20, 21, 24, 23, 32, 30, 31, 41, 50, 48, 49, 54, 52, 55, 56, 65, 3, 9, 18, 67] . For 
selected papers we give an overview in the following sections 

8.1 Computation of Greeks 

A major part of the literature related to this topic is due to Giles, Capriotti and, respectively, Joshi (and 
their collaborators). 

The seminal paper of [41] applied the adjoint approach to the computation of Greeks (Delta and Vega) 
for swaptions using pathwise differentiation method in the context of LIBOR Market Model (LMM). 
The portfolio considered had portfolio had 15 swaptions all expiring at the same time, N periods in the 
future, involving payments/rates over an additional 40 periods in the future. Interested in computing 
Deltas, sensitivity to initial N+40 forward rates, and Vegas, sensitivity to initial N+40 volatilities. The 
computational efficiency was improved by a factor of 10 when the forward method was employed instead of 
finite differences. Then the adjoint approach reduced the cost by several orders of magnitude, compared to 
the forward approach: for N=80 periods, by a factor of 5 for computation of Deltas only, and respectively 
by a factor of 25 for computation of both Deltas and Vegas. 

This was extended in [58] to the pricing of Bermudan-style derivatives. For testing they have used five 
lxM receiver Bermudan swaptions (M = 4, 8, 12, 16, 20) with half-year constant tenor distances. The 
speedup was as follows (for M=20): by a factor of 4 for only Deltas, and by factor of 10 for both Deltas 
and Vegas. 

The pathwise approach is not applicable when the financial payoff function is not differentiable. To 
address these limitations, a combination the adjoint pathwise approach for the stochastic path evolution 
with likelihood ratio method (LRM) for the payoff evaluation is presented in [36, 38]. This combination 
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is termed "Vibrato" Monte Carlo. The Oxford English Dictionary describes "vibrato" as "a rapid slight 
variation in pitch in singing or playing some musical instruments". The analogy to Monte Carlo methods 
is the following; whereas a path simulation in a standard Monte Carlo calculation produces a precise value 
for the output values from the underlying stochastic process, in the vibrato Monte Carlo approach the 
output values have a narrow probability distribution. 

Applying concepts of adjoint/AD for correlation Greeks were considered in [20]. The pricing of an 
instrument based on N underlyings is done with Monte Carlo within a Gaussian copula framework, which 
connects the marginal distributions of the underlying factors. The sampling of the N jointly normal 
random variables is efficiently implemented by means of a Cholesky factorization of the correlation matrix. 
Correlation risk is obtained in a highly efficient way by implementing the pathwise differentiation approach 
in conjunction with AD, using the adjoint of Cholesky factorization. Numerical tests on basket default 
options shows a speedup of 1-2 orders of magnitude (100 times faster than bumping for 20 names, and 
500 times for 40 names). 

Examples of pathwise differentiation combined with AD are shown in [18]. For a basket option priced 
in a multidimensional lognormal model, the savings are already larger than one order of magnitude for 
medium sized baskets (N=10). For "Best of" Asian options in a local volatility setting, the savings are 
reported to over one order of magnitude for a relatively small number (N=12) of underlying assets. 

Adjoint algorithmic differentiation can be used to implement efficiently the calculation of counterparty 
credit risk [22]. Numerical results show a reduction by more than two orders of magnitude in the com- 
putational cost of Credit Valuation Adjustment (CVA). The example considered is a portfolio of 5 swaps 
referencing distinct commodities Futures with monthly expiries with a fairly typical trade horizon of 5 
years, the CVA bears non-trivial risk to over 600 parameters: 300 Futures prices, and at the money volatil- 
ities, (say) 10 points on the zero rate curve, and 10 points on the Credit Default Spread(CDS) curve of 
the counterparty used to calibrate the transition probabilities of the rating transition model, required 
for the calculation of the CVA. The computational time for CVA sensitivities is less than 4 times the 
computational time time spent for the computation of the CVA alone, as predicted by AD theory. As a 
result, even for this very simple application, adjoint/AD produces risk over 150 times faster than finite 
differences: for a CVA evaluation taking 10 seconds, adjoint approach produces the full set of sensitivities 
in less than 40 seconds, while finite differences require approximately 1 hour and 40 minutes. 

In the framework of co-terminal swap-rate market model [5 1 ] presented an efficient algorithm to imple- 
ment the adjoint method that computes sensitivities of an interest rate derivative (IRD) with respect to 
different underlying rates, yielding a speedup of a factor of 15 for Deltas of a Bermudan callable swaption 
with rates driven by a 5-factor Brownian motion. They show a total computational order of the adjoint 
approach of order 0(nF), with n the number of co-terminal swap rates, assumed to be driven by numberF 
Brownian motions, compared to the order of standard pathwise method which is 0{rfi). It was extended 
to Vegas in [55], in the context of three displaced diffusions swap-rate market models. A slight modifica- 
tion of the method of computing Deltas in generic market models will compute market Vegas with order 
0(nF) per step. Numerical results for the considered tests show that computational cots for Deltas and 
Vegas will be not more than 1.5 times the computational cots of only Deltas. CMS spread options in 
the displaced-diffusion co-initial swap market model are priced and hedged in [53] . The evolution of the 
swap-rates is based on an efficient two-factor predictor-corrector(PC) scheme under an annuity measure. 
Pricing options using the new method takes extremely low computational times compared with traditional 
Monte Carlo simulations. The additional time for computing model Greeks using the adjoint method can 
be measured in milliseconds. Mostly importantly, the enormous reduction in computational times does 
not come at a cost of poorer accuracy. In fact, the prices and Greeks produced by the new method are 
sufficiently close to those produced by a one-step two-factor PC scheme with 65,535 paths. 
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In the framework of the displaced-diffusion LIBOR market model, [49] compared the discretization 
bias obtained when computing Greeks with pathwise adjoint method for the iterative predictor-corrector 
and Glasserman-Zhao drift approximations in the spot measure to those obtained under the log-Euler 
and predictor-corrector approximations by performing tests with interest rate caplets and cancellable 
receiver swaps. They have found the iterative predictor-corrector method to be more accurate and slightly 
faster than the predictor-corrector method, the Glasserman-Zhao method to be relatively fast but highly 
inconsistent, and the log-Euler method to be reasonably accurate but only at low volatilities. 

In the framework of the cross-currency displaced-diffusion LIBOR market model, [11] employs adjoint 
techniques to compute Greeks for two commonly traded cross-currency exotic interest rate derivatives: 
cross-currency swaps (CCS) and power reverse dual currency (PRDC) swaps. They measure the com- 
putational times relative to the basic implementation of the crosscurrency LIBOR market model, that is 
with standard least squares regression used to compute the exercise strategy. It was reported that, using 
the adjoint pathwise method, the computational time for all the Deltas and Vega for each step and the 
exchange Vega is only slightly larger compared to the computational time required to compute one Delta 
using finite differences. 

Adjoint approach was applied for computation of higher order Greeks (such as Gammas) in [52] and [54], 
where they show that the Gamma matrix (i.e. the Hessian) can be computed in AM + B times the number 
of operations where M is the maximum number of state variables required to compute the function, and 
A,B are constants that only depend on the set of floating point operations allowed.. In the first paper 
numerical results demonstrate that the computing all n(n+l)/2 Gammas for Bermudan cancellable swaps 
in the LMM takes roughly n /z times as long as computing the price. In the second paper numerical results 
are shown for an equity tranche of a synthetic CDO with 125 names. To compute all the 125 finite- 
difference Deltas, the computational time for doing so will be 125 times of the original pricing time (if 
we use forward difference estimates). However, it only takes up to 1.49 times of the original pricing time 
if we use algorithmic differentiation. There are 7,875 Gammas to estimate. If one uses central difference 
estimates, it will take 31,250 times of the original pricing time to achieve this. It only takes up to 195 
times to estimate all the Gammas with algorithmic Hessian methods. 

In the framework of Markov-functional models, [30] demonstrates how the adjoint PDE method can 
be used to compute Greeks. This paper belongs to the ContAdj category. It obtains an adjoint PDE, 
which is solved once to obtain all of the model Greeks. In the particular case of a separable LIBOR 
Markov-functional model the price and 20 Deltas, 40 mapping Vegas and 20 skew sensitivities (80 Greeks 
in total) of a 10-year product are computed in approximately the same time it takes to compute a single 
Greek using finite difference. The instruments considered in the paper were: interest rate cap, cancellable 
inverse floater swap and callable Bermudan swaption. 

In the framework of Heston model, the algorithmic differentiation approach was considered in [24] 
to compute the first and the second order price sensitivities.. Issues related to the applicability of the 
pathwise method are discussed in this paper as most existing numerical schemes are not Lipschitz in 
model inputs. While AD/adjoint approach is usually considered for primarily providing a very substantial 
reduction in computational time of the Greeks, this paper also shows how its other major characteristic, 
namely accuracy, can be extremely relevant in some cases. Computing price sensitivities is done using 
the Lognormal (LN) scheme, the Quadratic- Exponential (QE) scheme, the Double Gamma (DG) scheme 
and the Integrated Double gamma (IDG) scheme. Numerical tests show that the sample means of price 
sensitivities obtained using the Lognormal scheme and the Quadratic-Exponential scheme can be highly 
skewed and have fat-tailed distribution while price sensitivities obtained using the Integrated Double 
Gamma scheme and the Double Gamma scheme remain stable. 

The adjoint/AD approach is also mentioned in the very recent "opus magna" on interest rate derivatives 
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[5, 6, 7], written by 2 well-known practitioners. 

8.2 Calibration 

To the best of our knowledge, [3] was the first time that an adjoint approach was presented for calibration 
in the framework of computational finance 

In [57] adjoint methods are employed to speed up the Monte Carlo-based calibration of financial market 
models. They derive the associated adjoint equation /(and thus are in ContAdj category) and propose its 
application in combination with a multi-layer method. They calibrate two models: the Heston model is 
used to confirm the theoretical viability of the proposed approach, while for a lognormal variance model 
the adjoint-based Monte Carlo algorithm reduces computation time from more than three hours (for finite 
difference approximation of the gradient)to less than ten minutes 

The adjoint approach is employed in [59, 60] to construct a volatility surface and for calibration of local 
stochastic volatility models. The fitting of market bid/ask-quotes is not a trivial task, since the quotes 
differ in quality as well as quantity over timeslices and strikes, and are furthermore coupled by hundreds 
of arbitrage constraints. Nevertheless, the adjoint approach enabled on the fly fitting (less than 1 second) 
for real-life situations: a volatility matrix with 17 strikes and 8 maturities. 

AD is applied to calibration of Heston model in[44], through differentiation of vanilla pricing using char- 
acteristic functions. It takes advantage of the fact that it can be applied to complex numbers (if they are 
"equipped" with auto-differentiation). Applying AD to numerical integration algorithm is straightforward, 
and the resulting algorithm is fast and accurate. 

In [69] the adjoint approach (of ContAdj type) is applied to calibrate a local volatility model 

Calibration of a local volatility model is also shown in [65] 

9 AD and adjoint applied within a generic framework in practitioner 
quant libraries 

As more financial companies become familiarized with the advantages offered adjoint /AD approach when 
applied to computational finance, this approach will be more and more integrated in their quant libraries. 
Very recent presentations [19, 17, 44, 60, 10] at major practitioner quant conferences indicate that this effort 
is under way at several banks, such as Credit Suisse, Unicredit, Nomura, etc. Within this framework one 
may imagine a situation where Greeks are computed (using AD techniques) in real time to provide hedging 
with respect to various risk factors: interest rate, counterparty credit, correlation, model parameters etc. 
Due to complexity of quant libraries, AD tools and special techniques of memory management, check- 
pointing etc can prove extremely useful, potentially leveraging knowledge acquired during AD applications 
to large software packages in other areas: meteorology, computational fluid dynamics etc where such tech- 
niques were employed [43, 25, 45, 4, 39, 15, 8] 

9.1 Block architecture for Greeks 

[10] presents a "Block architecture for Greeks" , using calculators and allocators. Sensitivities are computed 
for each block component, including root-finding and optimization routines, trees and latices. 

Here is an example included in the presentation. First some notations: every individual function 
(routine), say / : lR m — > R n y=f(x)is termed a "calculator" function, and a corresponding "allocator" 
function is defined as 

G r .R*^R™ %(f)=f-J/ 
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If we have a sequence of functions f, g, h, then we calculate the Greeks by starting with the end point 
^y = 1 and working backwards by calling the allocator functions one-by-one in reverse order to the original 
order of the function calls. 

Suppose we have a pricing routine with three subroutines 

• f : Interpolates market data onto the relevant grid from input market objects 

• g : performs some calibration step on the gridded market data 

• h : core pricer which uses the calibrated parameters 

We need to write the three risk allocator functions for these three subroutines. Then we feed the starting 
point 75J7 = 1 into the pricer's allocator to get back the vector of risks to the calibrated parameters. The 
next step is to feed this into the calibrator's allocator to get back the vector of risks to the gridded market 
data. Finally, we feed that risk vector into the interpolator's allocator to get back risk to the original 
input market objects. 

9.2 Real Time Counterparty Risk Management in Monte Carlo 

[19] presents a framework for Real Time Counterparty Risk Management, employing AD to compute 
Credit Valuation Adjustment (CVA), with a reduction in computational time from 100 min (using bump 
and reprice) to 10 seconds, for a a portfolio of 5 commodity swaps over a 5 years horizon (over 600 risks) 

10 Additional resources 

Additional resources include reference books [13, 16, 26, 43], results on matrix differentiation [37, 62, 63], 
the online community portal for Automatic Differentiation [2], various AD tools such as FASTOPT, 
FASTBAD, ADMAT, FAD, ADOL-C, RAPSODIA, TAPENADE [1], papers on rules to construct the 
adjoint code [43, 12, 34, 35, 28, 29, 14, 3, 64] 

11 Conclusion 

We have presented an overview of adjoint and automatic (algorithmic) differentiation techniques in the 
framework of computational finance, and illustrated the major advantages of this approach for computation 
of sensitivities: great reduction of computational time, improved accuracy, and a systematic process to 
construct the corresponding "adjoint" code from existing codes 
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